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ABSTRACT 

A live tailored testing study was con 
compare the results of using either the one- parameter 
or the three- parameter logistic model to measure the 
college students on multiple choice vocabulary items, 
^he study showed the three-par^eter tailored* testing 
superior to the one-parameter procedure on the basis ' 
fit of observed to predicted item responses, test-ret 
convergence to stable ability estimates,' and test inf 
differences were found in the prediction of an outsid 
However, implicit in thfese results was the assumption 
nonconvetgence problem encountered in one- third of th 
three- parameter procedure could be solved. Thus, base 
reported in this study, the three- parameter tailored 
was deemed the technique of choice, at least for unid 
consisting of multiple choice items where guessing is 
(Author) 
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O • Tailored testing derives its name from its primary aim and characteristic, 

^ which is to^attempt to "tailor" a test for a given individual, often using computer. , 

capabili%ies. That is, rather tha^ administering the same set of test items to, all 
2 examinees, the tailored testing procedure presents , a ui.ique set of /tems that tries 
^ to match item difficulty levels to a person's ability. "An exaiuinee i. measured. 
«ost effectively when the test items are neither too difficult nor too easy for 
hitn" (Lord, 1970). tHus, one goal of the ^tailored testing procedure is to select 
• items from. a precalibrated item pool stored in the computer so that the probability 
.of a correct response by the examinee is .50 on each item. In general, tailored . 
testing procedures require the three components of a pool of calibrated items, an 
item selection technique, and a scoring method (Patience, 1977). 

Although several tailored testing procedures have been developed, most of the 
procedures employ either a one-parameter or a tl.ree-parameter logistic model for 
item calibration and ability estimation purposes. However, no empirical studies 
have been reported in the literature that directly . compare the.e ty^o tailored testing 
n>odels on the' basis of their relative performances and characteristics in actual 
r> live-testing settings. The primary purpose of the present study, therefore, was 
CD to deal with this issue .;d hopefully collect evidence for the recommendation of 
^ one model ove. the other in'th.is specific situation. We begin with a brief discussion . 
of the t\vO latent trait models. . . 
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The Rasch model {I960) or one-parameter logistic ir>odeI, i^ thoroughly described 
in a recent article by Wright (1977). Here let it suffice to say that the one 
parameter model requires only one ability parameter 6. for each person and one item 
difficulty parameter b^ to describe the interaction between an examinee and a test 
* -iteni. The exponential form of the simple logistic model is 

u; . (9 . - bj 

1 + e : 

■ where u . is the score" (0 or 1) on. Item i by Person j , 9 and b^ are as defined 
above, and P{u^J is yhe probability of a correct or incorrect response. 

■ In'c^^trast, the thre -a -parameter logistic model presented by Birnbaum (1968) ^ 
requires the estimation of three item parameters to describe the interaction between 
test items and examinees. The model is given by 

"Oa. (6 . - b^) 

P, . = P{u. . - 1} = c. (1 - c . ) Da.'e -b.) - - 

1 + e 



where P{u = 1} is the orobability of a correct response by Person j to Item i; 
ij ' 

c is the guessing parameter for Item i; D is a scaling constant .equal to 1.7; 

i . ' 

a is the item discri^nation parameter; b is the item difficulty parameter; and - 

i , . 

e is tl-.e ability parameter for Person j. Q . . the probability of an incorrect 

j ^ 
response, is defined simply as 1 - P^> 

Both models have in common the assumptions that the items may be scored 
dichotomously, that the latent trait being measured by the iteifis is unidimensional . 
that item parameters remain invariant across groups of examinees, and that local- 



independence holds (Lord and Novick, 1968). 
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The bases for the"coinparisbns of the two tailored testing procedures will.be 
(a) the goodness of fit of the models. using mean squared deviations of ^ observed^ 
from predicted response data, (b) ^the reliabilities of the two nstliods, the 
ability estimates yielded by the two procedures, (d) the correlation of the ability 
estimates with- die same outside criterion, (e) descriptive statistics for each 
procedui-e, (f) the rates at which the two methods convergetoability estimates, 
and (g) the information functions for the tvro procedures^ j 

* " ■ Method 

It^tn Calibrat''ions 

The source of. items used for the tailored ..testing comparison study was the, 
Syracuse Adult Development Study vocabulary tests , Forms C2 , D2 , and. E (1972) . . 
All of the items werr: of the multiple- choice form with five alternatives per item. 
A principr.l components factor analysis of the inter-item tetrachoric correlation 
coefficients conducted on form D-2 indicated that only one factor was pra|e«t'- in 
the test, accounting for approximately 41% of the variance, with a sample size of 

1,000 (Reckase, ..1972) . 

Two identical pools of 72 vocabulary items were constructed, one for use with 
the one-parameter model .and the other for the three-parameter model. The one- ' 
. parameter pool was calibrated using a modified version of a program giv^« in an , 
~ article by Wright and Panchapakesan (1969) . For the three parameter pool, the 
liciST pirogram developed by Wood, Wingersky, and Lord '(1976) was used. Table 1 
presents the means, standard deviations , and' ranges of the item parameter estimates 
resulting from the two calibration procedures, along with the sample sizes upon 
which they were based. 



\ 
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Spe cific T ailored Testing Procedures ? ^ / 

For 'the one-parameter procedure; items were selected for administration based ^ 
on difficulty values (b^). The procedure Began wi'h .an ability estimate of +.50 • / 
for the examinees, depending on the experimental condition to which they had been j 
assigned. Thus, the first itein adjninistered' was the first one encountered in the ^ . 

pool that was equal to the initial ability estimate, within a +.30 acceptance 
range. If the examinee answered the first item correctly, the^ next item administered ^ 
was the item in the pool at a fixed stepsize away " (.693) in a positive direction, 
i.e. a more difficult item, still. within the acceptance range. On the other haiid, 
an incorrect response led to the next item that was -.693 away, i.e, an easier item. ■ 
Th«2 .693 fixed, stepsize value had been previously determined through an analysis / 
of tailored testing operation . (Reckase , 1976) . 

When at least one item had been answered correctly and one incorrectly, the 
ability level of the examinee was "estimated using an empirical maximum likelihood . 
procedure. The technique • used was an iterative search? to determine the mode of : 
the likelihood distribution, which became the new ability estimate. The next item ^ 
administered was one selected so that it had probability .50 of being .answered 
correctly. For the one-parameter model this was an item with difficulty equal to 
the ability estimates within the 30 acceptance range of easiness; The tailored 
test w^s terminated when no items remained in the pool that fell within the ±.30 
range or when a maximum of 20 total items had been administered. 

For tha three-parameter procedure, items were selected for administration 
based on values of the information function. Actually, this was equivalent to the 
one-parameter item selection procedure , since , for the one-parameter model, selecting , 
items to maxinii'ze the information that an item provided about a person's ability . 
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was the same as selecting items on the basis of appropriate easiness value. That 
ic^the information function ^as maximal for the one-parameter model when the item j 
administered equalled the ability estimate. ■ • 

However, for the three-parameter m6del, the information function was morff 
complex, in particular, the added discrimination and guessing paraineters played a 
crucial role in determning the amplitude of the information curve. The formula 
.used to coi:5>ute''item information for the three-parameter logistic model was given 
in Bimbaum (1968) as ' " ' 



1(8 ,u.) = D2a.2^[DL'. (9.)] - D^a P f O [DL (6 ) - log c ] ■ 



(3) 



where 1(8., u.) is the information of Item i at ability level 8 for Person given 
item response^u^^I-iCejX = a. (8. - b J ; P. .(8.) is the probability .of a correct 
response to Item i given 'ability leyel 8 - ^(x) is the logistic proba^Dility density 

J . .; . 

function; and the other parameters have their meanihgs mentioned previously. The 
total test information was then simply the sum of the item information (Bixhbaum, 

. 1968) given by " • 

n 

^ . 1(0) - I 1(6. ,u.). . <4) 

. "I 1 ' 

i=l J 

The tailored testing procedure. for the three-parameter model l^egan the same 
way as described above. Namely, a fixed .693 stepsize was used to select items 
until at least one correct and incorrect response had been obtained. Ability 
estimates were again computed using the maximjm likelihood te-hnique. HoweVer, to 
select the nejrt item to be adiainistered, the item pool was searched for the item 
which had the most information (i.e. I(9j,u.) was maximal) tor that particular.. ^ 
ability estimate. This process was repeated until either no item was available 
'in the pool witii I°(e.,u.) > .70 or until a total of 20 items had been administered. 
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Design ^ ' , , 

The sutdy emplo/ed- a counterbalanced design in which tliere.wer two separate 
sessions one week 'apart for each examinee, with both tlte/one- and three-paratoeter , 
tests administered at each session. The order of te,^^^presentation was reversed 
■frbm-one session to the next fof'each examinee ) .but- the test was arranged so that 
•thfe examinees were- not aware of receiving two tests. The second test was initiated 
immediately after a final ability estimate was obtained from the first test. The ^ : 
tests 'wer^' all administered on ADDS Consul 980 cathode ray tube terminals connected 
to" an IBM 370yi68 computer through a timesharing system. 

The subjects whp participated in the study were undergraduate, and graduate 
students enrolled in educational psychology and measurement courses at the University 
of Missouri-Columbia. A total of 142 students took part in ti.e study, but 14 cases 
viere .deleted due to missing data, resulting in 128 net e:.aminees. All students 
received extra credit for their participation. 

Analyses ' * 

~' '{ „ ■ < ■ 

' The measure used to determine "the ■ goodness of fit of the observed response 

<? - ■ .. ■ 

data to the mo^lels was the mean squared deviation (MSD)'^ statistic given by - 

N 

' ^,SD. 



where MSD was the mean squared deviation for Person j; u was the actual response; 
■ 3 ^ , 

P was the predicted response from the model; and N.was the nuirfcer of items from - 

the tailored test. Two l^D statistics were calculated for each, examinee , one for 

each model from . the first test session. A systematic sample of 22 ejcaminees, was 

taken to compare the ^wo models using the MSD criterion " in a t-test analysis, since 
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it was desired Uhaf.MSD values be computed across, the range of ability estimates, 
yielded by' the tailored tests. 

'ihe reliability comparison' of the two models was nof a true test-retest 
reliability/ but rather was . a hybrid of test-retest and equivalent forms reliability. 
.It was impossible for an examinee to receive exactly the same tailored test twice ^ 
•due to differences in entry points into the item pool and to changes in response 
strings. However, numerous .items were repea'ted over test sessions^ as a function 
of the consistency in ability estimation for a person since ^items were selected 
from the.sam'e pool. Several descriptive statistics were also computed for the two 
-testing procedures such 'as average test length,- average difficulty, ,and percentage 
of test items in common over the two sessions. Where differences were found, .the 
effects on*" reliability- were partialed out. „ : . ^ *' ■ ~ 

Correlatifon analyses were conducted between ability estimates yielded by 
the one- and three-parameter models dver the two. test sessions,- as well as between 
the ability, estimates and an outside criterion of performance, namely, traditional^ 
paper and pencil exam scores over course material! The purpose of these correlations 
was to determine the degree to which the t^o test procedures were measuring, the. 
same thing, and whether one model" did better than the other in prediction of .the ; 



criterion • 

Information function analyses were performed to compare the two models in terms 
of relative efficiency J- the ratio of tailored test information to total test information 
(Lord, 1970) . A plot was constructed .of the relativfe efficiency of both the one- 
parameter and the three-parameter tailored tests against the same 30-item traditional 
vocabulary test. Ag>in, data for the plot were selected with a systematic rather 
than random sample to insure broad coverage over the range of ability estimates. 



convergence plots were dralm ^or th% tailored tests taken by each examxnee 
■ • - I , . . 

over both sessiorrs. On one axis were plotted the ability estimates calculated ' 

. / . . ■ ■ . ' ■ ■ ■• 

^fter each it4ni was administered, and on the other axis were plotted the items 

received, i^he purpose 'was to provide a graphic description of the rates at which the 
two m&dels converged to stable ability estimates. Direct comparisons in this regard 
were not possible since th.V one- a^d three-para.-neter aJoility estimates were on 
different scales. However, representative plots vjere selected arid subjective 
siommary judgements were made. ■ 



^ 

Results 



Goodness of Fit . • ^ 

The . results of the MSD statistic to compare the goodness of fit of tiie oae- 
and the thre^-parameter ^models are presented in Table 2. The MSD values are shown 
'for 22 cases along. with descriptive statistics and the results of a paired samples 
t-test analysis on the data. The t-test showed that the MSD statistic was 
significantly smaller (£_.<•. 05) - for the = three-parameter model, indicating better 
fit of the model to the observed response data.. 



Insert Table 2 about here 



Reliability 

The correlation matrix in Table 3 consists of the coefficients obtained from 
intercorrel^-.ting the various ability estimates yielded in the tailored tests from ■• 
tlic two models. Of special interest is the correlation between the ability 
estimate from the f itst" one-parameter logistic tailored test (IPL 1) and the second 
one-parameter logistic tailored test (IPL 2) . The .61 value shown in Table 3 is the 



sic . 



rell^ility coefficient for the one-parameter logistic tailored test,. This is 
^significantly lower (e.^ .05) than ^the ^.77 reliab.ilit>y coefficient obtained, by^ 

correlating the ^ability^estimates from the first three -parameter logistic tailored 
' test {3PL 1) and the second corresponding test '(SPL 2) . - . 



Insert Tablo 3 about here"". 



: ..',It is ve.-y iiaport^mt" to note, however, that these reliabilities are. based on 
only.89 rather than 128 cases. The difference is due to the failure of the three- - 
parameter tailored test to. Converge at ability estimates for 39 cases. The non- ^ 
convergence problem was ' common when using iaxim-um likelihood ability estimation for 

■ the thre^-paraketer .model when very difficult items Jere encountered which substantially 

■ raised the lower asymptote, of the logistic function, d. , the chance of obtaining 

° a correct response by random guessing. In such cases, the mode of. the likelihood, 
distribution could not be found, and the estimation procedure did not yield.an 
ability estimate . ' The values in parentheses in Table 3 indi cate the reliat>i lity 
coefficients obtained when the 39 nonconvergence cases remain in the analyses. The. 
three-parameter reliability now drops from .77 to only .36. The one-parameter 
reliability also drops slightly from. .61 to .55. However, the difference betweeri . 

. the reliabilities for 128 cases (.36 vs". .55) is nof statistically significant. 

■ • Since it was c.mmon for each tailored test ac^ministered to an^ examinee to have • 

■ differe,nt nunO^ers o^f test items, U ^ince test length often impacts on relieO^ility , 
another comparison was undertaken in which ability estimates were equated for test 
length... The correlation" between the fir'st and second one-parameter' tailored test 
ability estimates, .61. was compared, to' the ■ correlation between the .first and 
second three-paraxaeter ability estimates for tests with an equal numoer of V'tems .■ 
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' presented HPLEQI rv.--. 3PLEQI ."^l^^ ^esnlfirig •'difference 'between these 

correlations ,^61 and r'U>''-/V>f^^ - . . 

... The nuittoer of tect items iA c^^^o^ Vom one tesf to. another was also investigated, 
. .for a possible effect on reliai.i\i^^' ^ii^^e the three-parameter testf had 85%^ 
of such items in common, comp^feq ot,i^ 20% 'for the one-parameter tests. "Partial 
correlation coef f ioi'fents wer^ ^o^p^ifcea^V -factor out tfe effects of repeated test , 

■ items on the overall reliabiliti'^^;' b^t: the results showed this variable to have 

no effect. * ♦ , ' 

- Table A presents "several ^ad^^^io^^r descriptive statistics for one-^ and the ^ 
three-parametos' tests. For ^p^^ ' mean test difficulty for both procedures ^ 

. was about the sp^aie, close to .^Q./.^^is indicated that, in general, items of ^ • . ^ 

■ appropriate difficulty were h^>n^ ^^miiistered. Als.p note .that . the three-parameter - 
tests tended" to be slightly loJ^^e^ ^'^'^ ^le one-parameter tests. ^ 



^n^*^^^- -bable 4 -about here 




Other Correlation Analyses ^ ' ' '■' . 

jTable's illustrates the je^r^^ siEd-larity -among all the ^ility esfc.vmate ; 
'■intercorrej^tions , regardless .t^^e ' ^^ocedure-. The' abiiity^estimates .yielded' by 
the one-parameter tests and V^^ee.p^i^etei tests consistently fall in the range 
from .44 up to -70. Not shown ii^'^^ t?i>le, but qIso computed, wer^. the :correlations 
between thfe. ability estimate^ yi.^-'^ed the tailored tests and the- outside , " ' .. 
criterion of scores, on tradition/ course exams. -These correlations, were ^ 
consiste'ntly in the .30's for P^^cedures over both sessions , meaning that both " 

the one-parameter and;the tft^^^-^^^^V-t 'tests predicted the outside criterion " . 
equally well. - ' ' . . 
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Information' Function Analyses ' ' , 

The results ot .the relative efficiency ^comparison are ' shovm.in Figure 1. The 
horizontal dasheU-llne indicates the information of the tradn^tionax SO-it.em . 
vocabulary test as the reference position to -compare the two- types of tailored tests. 
" However, the ability sbales used for plotting t^ie two relative erf iciency curves 
are not the- same.- .The. plot sbcws that the, three-parameter tailored test^yielded 

- substantiaiiy greater information than the traditional teit, but only, in a peaked 
fishion for ability estimate levels between -2.0^ and +.50,- falling off sharply, 
outside this' range. However, atVno point did the one--parameter tailored test _ 

- exceed the traditional test.. information, and its Information curve was rectangular, 
rather than peaked-. Also shown in .Figi^s^' are theXfrequency distributions or 
ability estimates obtained from the two procedures. Note th^t the information from 
the rhree-par-aineter test is greatest '^^te most of the examinees were concentrated* 



Insert Figure i about here 



Converg ^ncf> Plots . - ^ 

' In FiguiTsfi- 2 'are pictured ^fovir individual tailored testing convergence plots, 
including good 'and poor examples of convergence^ for each of the twd types of tailored 



•f ' ■ " . .. 

tests Plot' 2-A shows a case where neither procedure converged .very well, 2-B 

a case where the oAe-parameter test did well but ^ not: the three-parameter test,. - 

, 2-C a ease in which the three-parameter test converged better than the one- 

parameter test, 'and 2-D^where both procedures converged nicely. A subjective ^ 

■classi-f icatiop^ ntethod appli-ed to -44 separate cases r6'^ulted in the following 

■ A.' . - ■ ' ■ ' ■ .' ■ 

'1 breakdown: 2-A, '7 plots'^ 2-B, 5 plots; -2-C, 18 plots; and 2-D,; 14'plots. However, 
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recall that in 39 cases, not included In the above categories, the three-parameter 
tailored testing procedure failed to converge at all. 



Insert Figure 2 about here 

■ ^ * " \ ' . ' ■ \ 

' . \ Discussion 

. Theoretically the 'mSD statistic had a possible range in value from 0 to 1 
.'0 for'perfect fit and 1 for perfect lack of fit. In actual practice, however, the^, " 
r,va;ue of the MSD for ^.a examinee rarely exceeded .25 for either model. .Although the,- 
. sampling distribution of the MSD statistic was unkno,wn. previous researci^ h^ 

: shown the distribution to be approximately normal (Reckase . 1977). .Thus the t-test . 
. results may be interpreted.for this data as evidence'- that the^ three-parameter 

'-'•^^t^lored testing procedure/did a significantly better 30b ;of fitting the response . 
data -than the one-parameter test. "The rfes^lt showed a closer match between the ^ 
item responses predicted by the model and the actual observed responses for the. 

three-^paraineter tailored test. - • . 

The reliability comparison "also showed the three-parameter procedure to be 
'^:s£rior.. but only wlien abodt one-third of t*e nonconverging tests were removed ^ 
■ - from the data analysis. This superiority held evenwhen the effects of test length 
, and ^repeabed items, were contro],led or equated for both , procedures. ^ ^ < ^ — ' 

' However, the consistent, moderately, high^degree or in^ercorrelatldn - 
^. .^lity estimates yielded' by both models .over both sessionsV indicated that both' ' 
^. procedures were Jeasu^ing the same thing. Moreover, both of. the tailored testing 

\n4thods cc»:related equally' well. with the outside criterion measure.. In, this regard ^ 
Vr-it should be noted that high correlations were not expected, since performance 

a- general vocabulary test wou^ not necessarily lead to similar performances 

■ " ■ 
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on course achievement tests. However, the achievement test scores were the ofily 
outside criterion available for the examinees. • 
'•• The /descriptive statistics for the two tailored testing procedures showed the 
three-parameter tests to be slightly longer on the averaae, although test length 
differences would best be intefpreted as being a function of the different H^:em 
selecUon methods and stopping rules employed. Since the ±.30 acceptance range^ 
•for the one-parameter method and the .70 information- level cutoff for the three-. 
^ parameter meth'od were both somewhat arbitrary values derived from simulation and 
empirical' studies, changes in tHese values would have changed the number of items ' , 
administered. Both procedure^ functioned well on the average in adndnistering 
items of appropriate difficulty (near ,50) for the examinees. „^ 

The. relative efficiency, comparison of the two procedures based on their 
respective test information curves showed that neither, type of tailored test 
provided as ^uch information across the broad range of ability estimates as did the 
traditional test,. .However, the three-parameter- procedure did exceed the traditional 
test information for a limited range of abilities, the range in which mos^t persons 
were coiicentrated , while in no case did the one-parameter test information do so. 
The subjective analysis^of the conve^rgence plots on the whole indicated that 
^-the-three-parameter tailored tests ^ a better .job of arriving at stable ability 
estimates 'than' the one-parameter, test^y Of ^o^-^^e , this result held only when 39 
nonconvergence cases. were removed f ro^ \he data analysis. If included, the one- 
parameter tailored test convergence patterns would have been superior. , 



Suininayy and ConclusiQn 

i 

A live,:tailored testing Study was conducted to compare the results of using 
either the one-parameter .logistic model or the three-parameter logistic model to 

' \ ■■■ ■ ■■ ' V 

" • ' • , , • . . ...... 



measure the performance of college students on multiple choice vocabulary items. 
The results- of the study showed the three-parameter tailored testing procedure to 
be superior to the one-parameter procedure on the basis of goodness of- fit of. observed 
to predicted item responses, test-retest reliability, convergence to stable ability 
estimates, and test information. No differences were found in the prediction of an^ 

'■.outside criterion.- However, implicit in these results was the assumption that the 
nonconvergence problem encountered in one-third of the cases for the three-parameter' 

: procedure could be solved. Thus, based on the data, reported in this study T the. 

three-parameter tailored testing method was deemed the techique ..of choice , at least , 
for uhidiifl-ensional tests consisting of multiple choice items where guessing is a 
factor. .. „ , 



Table 1 



Descriptive' Statistics of Item Parameter 
Estimates for. the Two Models 



< 


One Parameter 

Model - Three 


Parameter 


Model 










Meala 

Standard Deviation^ 

Low 

High 

Sample Size 
No. of Items 


,-. - .172 .990 
1.467 3 ^ .533 
. ^'-2.821 .118 
3.559 2.000 
1,000 1,541 V 
72 ^ 72 


- .519 
1.529 

-3.624 
5.952 
1,541 
72 


.121 
.042 
.023 
.270 
1,541 
72 


> 

^The LOGIST program 


imposes the restriction that discrimination 


estimates 



must stay in the grange from .01 to 2.00. 




Table 2 



Goodness of Fit Comparison 
Using the MSD Statistic 



Observations 



One 



Parameter 
MSD 



Three Parameter 
MSD 



1 ; 
-2 - 
3 
4 
5 
6 
7 
8 
9 
10 
11 
" 12 
13 
14 
15 
16 
17 , 
18 
19 
20 
21 
22 



8- 
X 



.198 
.197 
.212 
.214 
.083 
.203 
i202 
.187 
.208 
.204 
.192 
.083 
.215 
. r96 
.164 
.194 
, .203 
( .203 
.183 
.'214 
.182 
.188 

.188 
.055 



.184-. 
.206 
.158 
. 100 
.143 
.098 
.208 
. 156 
.153 
.140 
.171 
.133 
.267 
.191 
.198 
, .144 
.166 
-.126 
.247 
.149 
.022 
.185 

.461 
.063 



'21 



= 2.086 



(£ < .05) 
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Variables 



I 

— "] Table. 3 



Abiliiiy Estimate Correlations 



3 



J- 



8 



1 IPL 1 .6i(.55)^ .96: .53 .57 .58 . .53 .59 

rpT ; ^ .53 .'90 .68 .70. .63 .69 

V 'iPTFOl 1 ' - - ^ .47 ^ .49: .53 . .44 .55 

3. -IPLEQI 1 ^ .47- .49 

4. ^^f^. r ^ ■ V f \:77(.36)^ .90 .76 

5. 3PL 1 . ^ ~' \ ■ ~ . ^79 .96 

7t' .3PLEQI 1 j 

8. 3PLEQI '2 • . t,._ _ 1 - 



(0^ indicates the' inclusion of 39.' ceases of non-convergence at - an ability 
esti-mate for the three parameter test', with/ all other correlations 
based on 89 cases. . 



Table A 
Descriptive Statistics 



Variable 



n 



= 89 



Oiie, Parameter 
Tailored Test 



Three Parameter 
Tailored Test 



mean 


# of itfems administered y 


15.07. 


18.39 


- -13 


mean 


# of items correct 


7.45 


8.95 




mean 


test difficulty^ 


.49., 


.49"" 




mean 


ability estimates - 


.44 


-.77 
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Figure 1 
Rela+/Ve E-ff/ciency 
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Figure." 2. 
Corwieirqence P/o+S 
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